pacman::p_load(sf, sfdep, tmap, tidyverse, patchwork, DT, mapview, leaflet, scales)
# - Creates a package list containing the necessary R packages
# - Checks if the R packages in the package list have been installed
# - If not installed, will install the missing packages & launch into R environment.Take-home Exercise 1: Geospatial Analytics for Public Good
1 Overview
1.1 Background
As urban infrastructures, including public transportation systems like buses, taxis, mass rapid transit, public utilities, and roads, become increasingly digitised, the data generated becomes a valuable resource for tracking the movements of people and vehicles over space and time. This transformation has been facilitated by pervasive computing technologies such as Global Positioning System (GPS) and Radio Frequency Identification (RFID) tags on vehicles. An example of this is the collection of data on bus routes and ridership, amassed from the use of smart cards and GPS devices available on public buses.
The data collected from these sources is likely to contain valuable patterns that offer insights into various aspects of human movement and behavior within a city. Analyzing and comparing these patterns can provide a deeper understanding of urban mobility. Such insights can be instrumental in improving urban management and can also serve as valuable information for both public and private sector stakeholders involved in urban transportation services. This information can aid them in making informed decisions and gaining a competitive edge in their respective domains.
1.2 Objectives
The objective of this study is to utilize appropriate Local Indicators of Spatial Association (LISA) and Emerging Hot Spot Analysis (EHSA) techniques to uncover spatial and spatio-temporal mobility patterns among public bus passengers in Singapore.
This will include the following tasks:
- Geovisualisation and Analysis:
- Compute the passenger trips generated by origin at the hexagon level
- Display the geographical distribution of the passenger trips
- Explore spatial patterns revealed by the geovisualisation
| Peak hour period | Bus tap on time |
|---|---|
| Weekday morning peak | 6am to 9am |
| Weekday afternoon peak | 5pm to 8pm |
| Weekend/holiday morning peak | 11am to 2pm |
| Weekend/holiday evening peak | 4pm to 7pm |
- Local Indicators of Spatial Association (LISA) Analysis - Compute LISA of the passenger trips generate by origin - Display and draw statistical conclusions of LISA maps
- Emerging Hot Spot Analysis (EHSA)
- Perform Mann-Kendall Test by using the spatio-temporal local Gi* values
- Display EHSA maps of the Gi* values, describe the spatial patterns revealed
2 Loading Packages
The following packages will be used for this exercise:
The code chunk below, using p_load function of the pacman package, ensures that sfdep, sf, tmap and tidyverse packages are installed and loaded in R.
3 Data Preparation
3.1 The Data
The following data are used for this study:
- Aspatial:
- Passenger Volume by Origin Destination Bus Stops for August, September and October 2023, downloaded from LTA DataMall using API.
- Geospatial
- Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.
- hexagon, a hexagon layer of 250m is provided to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.
3.2 Import & Preparation
3.2.1 Aspatial
::: panel-tabset ## Import into R
We will be importing the Passenger Volume by Origin Destination Bus Stops dataset from August to October 2023, downloaded from LTA DataMall by using read_csv() or readr package.
odbus08 <- read_csv("data/aspatial/origin_destination_bus_202308.csv")
#odbus09 <- read_csv("data/aspatial/origin_destination_bus_202309.csv")
#odbus10 <- read_csv("data/aspatial/origin_destination_bus_202310.csv")Data Exploration
(a) Attributes
glimpse() of the dplyr package allows us to see all columns and their data type in the data frame.
glimpse(odbus08)Rows: 5,709,512
Columns: 7
$ YEAR_MONTH <chr> "2023-08", "2023-08", "2023-08", "2023-08", "2023-…
$ DAY_TYPE <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 17, 7, 17, 14, 10, 10,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <chr> "04168", "04168", "80119", "80119", "44069", "4406…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "1722…
$ TOTAL_TRIPS <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …
#glimpse(odbus09)
#glimpse(odbus10)Insights:
- There are 7 variables in the odbus08 tibble data, they are:
- YEAR_MONTH: Month in which data is collected
- DAY_TYPE: Weekdays or weekends/holidays
- TIME_PER_HOUR: Hour which the passenger trip is based on, in intervals from 0 to 23 hours
- PT_TYPE: Type of public transport, i.e. bus
- ORIGIN_PT_CODE: Origin bus stop ID
- DESTINATION_PT_CODE: Destination bus stop ID
- TOTAL_TRIPS: Number of trips
- We also note that values in ORIGIN_PT_CODE and DESTINATON_PT_CODE are in numeric data type.
(b) Unique Bus Stops
n_distinct() of the dplyr package allows us to count the unique bus stops in the data set.
n_distinct(odbus08$ORIGIN_PT_CODE)[1] 5067
The results reveal that there are 5067 distinct origin bus stops.
Data Wrangling
(a) Convert Data Type
as.factor() can be used to convert the variables ORIGIN_PT_CODE and DESTINATON_PT_CODE from numeric to categorical data type. We use glimpse() again to check the results.
odbus08$ORIGIN_PT_CODE <- as.factor(odbus08$ORIGIN_PT_CODE)
odbus08$DESTINATION_PT_CODE <- as.factor(odbus08$DESTINATION_PT_CODE)
glimpse(odbus08)Rows: 5,709,512
Columns: 7
$ YEAR_MONTH <chr> "2023-08", "2023-08", "2023-08", "2023-08", "2023-…
$ DAY_TYPE <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 17, 7, 17, 14, 10, 10,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <fct> 04168, 04168, 80119, 80119, 44069, 44069, 20281, 2…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 17229, 20141, 2…
$ TOTAL_TRIPS <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …
Note that both of them are in factor data type now.
(b) Duplicates Check
Before moving on to the next step, it is a good practice for us to check for duplicated records to prevent double counting of passenger trips.
duplicate <- odbus08 %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate# A tibble: 0 × 7
# ℹ 7 variables: YEAR_MONTH <chr>, DAY_TYPE <chr>, TIME_PER_HOUR <dbl>,
# PT_TYPE <chr>, ORIGIN_PT_CODE <fct>, DESTINATION_PT_CODE <fct>,
# TOTAL_TRIPS <dbl>
Results confirm that there are no duplicated records found.
(c) Extracting the Study Data
In our study, we would like to know patterns for 4 peak hour periods. Therefore, we can create a new variable period using the ifelse() that states whether an observation occurred during peak period using the code chunk below.
peak <- odbus08 %>%
# Weekday morning peak
mutate(period= ifelse(DAY_TYPE=="WEEKDAY" & (TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9), "WDM",
# Weekday afternoon peak
ifelse(DAY_TYPE=="WEEKDAY" & (TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20), "WDA",
# Weekend/holiday morning peak
ifelse(DAY_TYPE=="WEEKENDS/HOLIDAY" & (TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14), "WEM",
# Weekend/holiday evening peak
ifelse(DAY_TYPE=="WEEKENDS/HOLIDAY" & (TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19), "WEE",
# Return off-peak if neither of the peak hour periods
"Off-peak")))))We can then filter for peak-period data using the newly created period column and aggregate the total trips for each origin bus stop during peak period.
peakperiods <- peak %>%
# filter helps to keep records that occurred during period periods
filter(period !="Off-peak") %>%
# aggregate the total passenger trips for each origin bus stop
group_by(period, ORIGIN_PT_CODE) %>%
summarise(TRIPS=sum(TOTAL_TRIPS))Let’s visualise the proportions of passenger volumes for each peak period.
Show the code
freq<- ggplot(data=peakperiods,
aes(x=period,y=TRIPS))+
geom_bar(stat="identity") +
theme(legend.position="none")+
labs(title = "Frequency of Trip for each Peak Period",
x = "Peak Period",
y = "Frequency")
freq + scale_y_continuous(labels=label_comma())
We can see that passenger volume on weekdays are much higher than over the weekends/holidays.
Transpose each peak period period as a columns using pivot_wider() of tidyr package will allow us to create further variables at a bus stop level. We replace NA values with 0 to reflect when there are no traffic for certain periods.
peakperiods_wide <- pivot_wider(peakperiods,
names_from = "period",
values_from = "TRIPS")
peakperiods_wide["WDA"][is.na(peakperiods_wide["WDA"])] <- 0
peakperiods_wide["WDM"][is.na(peakperiods_wide["WDM"])] <- 0
peakperiods_wide["WEE"][is.na(peakperiods_wide["WEE"])] <- 0
peakperiods_wide["WEM"][is.na(peakperiods_wide["WEM"])] <- 0
glimpse(peakperiods_wide)Rows: 5,067
Columns: 5
$ ORIGIN_PT_CODE <fct> 01012, 01013, 01019, 01029, 01039, 01059, 01109, 01112,…
$ WDA <dbl> 8448, 7328, 3608, 9317, 12937, 2133, 322, 45010, 27233,…
$ WDM <dbl> 1973, 952, 1789, 2561, 2938, 1651, 161, 8492, 9015, 424…
$ WEE <dbl> 3208, 2796, 1623, 4244, 7403, 1190, 88, 21706, 11927, 6…
$ WEM <dbl> 2273, 1697, 1511, 3272, 5424, 1062, 89, 14964, 8278, 61…
Notice that there are 5067 unique origin bus stops.
(d) Variable Transformation
Show the code
# Extract column
distWDM <- peakperiods_wide$WDM
# Calculate mean
distWDM_mean <- mean(distWDM)
plot_distWDM <- ggplot(
data = data.frame(distWDM),
aes(x = distWDM)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "black",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distWDM_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 80000,
y = 2000,
label = paste("Mean =", round(distWDM_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekday Morning Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
) +
scale_x_continuous(labels = label_number(), n.breaks=8)
# Extract column
distWDA <- peakperiods_wide$WDA
# Calculate mean
distWDA_mean <- mean(distWDA)
plot_distWDA <- ggplot(
data = data.frame(distWDA),
aes(x = distWDA)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "black",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distWDA_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 110000,
y = 2000,
label = paste("Mean =", round(distWDA_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekday Afternoon Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
) +
scale_x_continuous(labels = label_number(), n.breaks=8)
# Extract column
distWEM <- peakperiods_wide$WEM
# Calculate mean
distWEM_mean <- mean(distWEM)
plot_distWEM <- ggplot(
data = data.frame(distWEM),
aes(x = distWEM)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "black",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distWEM_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 23000,
y = 2000,
label = paste("Mean =", round(distWEM_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekdend/Holiday Morning Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
) +
scale_x_continuous(labels = label_number(), n.breaks=8)
# Extract column
distWEE <- peakperiods_wide$WEE
# Calculate mean
distWEE_mean <- mean(distWEE)
plot_distWEE <- ggplot(
data = data.frame(distWEE),
aes(x = distWEE)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "black",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distWEE_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 29000,
y = 2000,
label = paste("Mean =", round(distWEE_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekend/Holiday Evening Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
) +
scale_x_continuous(labels = label_number(), n.breaks=8)
(plot_distWDM | plot_distWDA)/
(plot_distWEM | plot_distWEE)
The distribution of passenger trips for the 4 peak periods appear to be highly skewed to the right. Rescaling our data using log transformation can greatly reduce the skewness.
peakperiods_wider <- peakperiods_wide %>%
mutate(logWDM = ifelse(WDM == 0, 0, log(WDM)),
logWDA = ifelse(WDA == 0, 0, log(WDA)),
logWEM = ifelse(WEM == 0, 0, log(WEM)),
logWEE = ifelse(WEE == 0, 0, log(WEE)))Let’s visualise the distribution of the 4 peak periods again.
Show the code
# Extract column
distlogWDM <- peakperiods_wider$logWDM
# Calculate mean
distlogWDM_mean <- mean(distlogWDM)
plot_distlogWDM <- ggplot(
data = data.frame(distlogWDM),
aes(x = distlogWDM)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "#34414E",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distlogWDM_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 10,
y = 1000,
label = paste("Mean =", round(distlogWDM_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekday Morning Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
)
# Extract column
distlogWDA <- peakperiods_wider$logWDA
# Calculate mean
distlogWDA_mean <- mean(distlogWDA)
plot_distlogWDA <- ggplot(
data = data.frame(distlogWDA),
aes(x = distlogWDA)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "#34414E",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distlogWDA_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 10,
y = 1000,
label = paste("Mean =", round(distlogWDA_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekday Afternoon Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
)
# Extract column
distlogWEM <- peakperiods_wider$logWEM
# Calculate mean
distlogWEM_mean <- mean(distlogWEM)
plot_distlogWEM <- ggplot(
data = data.frame(distlogWEM),
aes(x = distlogWEM)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "#34414E",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distlogWEM_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 10,
y = 1000,
label = paste("Mean =", round(distlogWEM_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekdend/Holiday Morning Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
)
# Extract column
distlogWEE <- peakperiods_wider$logWEE
# Calculate mean
distlogWEE_mean <- mean(distlogWEE)
plot_distlogWEE <- ggplot(
data = data.frame(distlogWEE),
aes(x = distlogWEE)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "#34414E",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distlogWEE_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 10,
y = 1000,
label = paste("Mean =", round(distlogWEE_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekend/Holiday Evening Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
)
(plot_distlogWDM | plot_distlogWDA)/
(plot_distlogWEM | plot_distlogWEE)
3.2.2 Geospatial
(a) Bus Stop Shapefile
In this section, we import BusStop shapefile into RStudio using st_read() function of sf package. This data provides the locations of all bus stops as at Q2 of 2023. crs = 3414 ensures coordinate reference system (CRS) is 3414, which is the EPSG code for the SVY21 projection used in Singapore.
busstop <- st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`C:\kytjy\ISSS624\Take-Home_Ex\Take-Home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
The imported shape file is simple features object of sf. From the output, we can see that there are 5161 points with 3 fields, and confirm that the datum SVY21 is correct.
Recall that there are 5067 origin bus stops from the peakperiods_wider table, compared to the 5161 bus stops from LTA’s BusStop shape file. This could be due to timing difference – LTA’s BusStop shapefile is as of July 2023, while peakperiod is based on Aug 2023.
mapview::mapview(busstop)Note that there are 5 bus stops located outside Singapore, they are bus stops 46239, 46609, 47701, 46211, and 46219.
(b) Hexagon Layer
A hexagonal grid is used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA. Hexagons have a number of advantages over these other shapes:
- The distance between the centroid of a hexagon to all neighboring centroids is the same in all directions.
- The lack of acute angles in a regular hexagon means that no areas of the shape are outliers in any direction.
- All neighboring hexagons have the same spatial relationship with the central hexagon, making spatial querying and joining a more straightforward process.
- Unlike square-based grids, the geometry of hexagons are well-structured to represent curves of geographic features which are rarely perpendicular in shape, such as rivers and roads.
- The “softer” shape of a hexagon compared to a square means it performs better at representing gradual spatial changes.
We first create a hexagonal grid layer of 250m (refers to the perpendicular distance between the centre of the hexagon and its edges) with st_make_grid, and st_sf to convert the grid into an sf object with the codes below.
st_make_grid function is used to create a grid over a spatial object. It takes 4 arguments, they are:
x: sf object; the input spatial data
cellsize: for hexagonal cells the distance between opposite edges in the unit of the crs the spatial data is using. In this case, we take cellsize to be 250m * 2 = 500m
- what: character; one of:
"polygons","corners", or"centers" - square: indicates whether you are a square grid (TRUE) or hexagon grid (FALSE)
area_hexagon_grid = st_make_grid(busstop, 500, what = "polygons", square = FALSE)Next, st_sf converts the grid created to sf object while lengths() of Base R is used to calculate the number of grids created.
# Converts grid to sf
hexagon_grid_sf = st_sf(area_hexagon_grid) %>%
# Assign unique ID to each grid
mutate(grid_id = 1:length(lengths(area_hexagon_grid)))We count the number of bus stops in each grid and keep grids with bus stops using the code chunks below.
# Create a column containing the count of bus stops in each grid
hexagon_grid_sf$busstops = lengths(st_intersects(hexagon_grid_sf, busstop))
# Remove if no bus stop in side that grid, ie only keep hexagons with bus stops
hexagon_w_busstops = filter(hexagon_grid_sf, busstops > 0)Let’s confirm that all bus stops have been accounted for in our hexagon layer.
sum(hexagon_w_busstops$busstops)[1] 5161
This is in line with the 5161 points of the busstop shapefile.
Lastly, using tm_shape of tmap, we can quickly visualise the results of the hexagon grids we have created.
Show the code
tmap_mode ("view")
hex <- tm_shape(hexagon_w_busstops)+
tm_fill(
col = "busstops",
palette = "Blues",
style = "cont",
title = "Number of Bus Stops",
id = "grid_id",
showNA = FALSE,
alpha = 0.6,
popup.format = list(
grid_id = list(format = "f", digits = 0)
)
)+
tm_borders(col = "grey40", lwd = 0.7)
hex(a) Combining Busstop and hexagon layer
Code chunk below populates the grid ID (i.e. grid_id) of hexagon_w_busstops sf data frame into busstop sf data frame.
bs_wgrids <- st_intersection(busstop, hexagon_w_busstops) %>%
select(BUS_STOP_N,BUS_ROOF_N,LOC_DESC, grid_id, busstops) %>%
st_drop_geometryst_intersection()is used to perform point and polygon overly and the output will be in point sf object.select()of dplyr package is then use to retain only BUS_STOP_N and SUBZONE_C in the busstop_mpsz sf data frame.st_stop_geometry()removes the geometry data to manipulate it like a regular dataframe usingtidyranddplyrfunctions
Before we proceed, let’s perform a duplicates check on bs_wgrids.
duplicate2 <- bs_wgrids %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate2# A tibble: 8 × 5
BUS_STOP_N BUS_ROOF_N LOC_DESC grid_id busstops
<chr> <chr> <chr> <int> <int>
1 43709 B06 BLK 644 1904 7
2 43709 B06 BLK 644 1904 7
3 58031 UNK OPP CANBERRA DR 2939 7
4 58031 UNK OPP CANBERRA DR 2939 7
5 51071 B21 MACRITCHIE RESERVOIR 3081 6
6 51071 B21 MACRITCHIE RESERVOIR 3081 6
7 97079 B14 OPP ST. JOHN'S CRES 5037 5
8 97079 B14 OPP ST. JOHN'S CRES 5037 5
Results displayed 4 genuine duplicated records. We remove these to prevent double-counting.
The code chunk below helps retain unique records.
bs_wgrids <- unique(bs_wgrids)(c) Populate PeakPeriods with Grid Details
We can now append the grid ID from bs_wgrids data frame onto peakperiods_wide data frame. Recall we previously identified 5 bus stops outside Singapore, filter() allows us to exclude the 5 outside Singapore.
origin_grid <- left_join(peakperiods_wider, bs_wgrids,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE) %>%
group_by(grid_id) %>%
# retains SG bus stops
filter(!ORIGIN_BS %in% c(46239, 46609, 47701, 46211, 46219))
glimpse(origin_grid)Rows: 5,076
Columns: 13
Groups: grid_id [1,504]
$ ORIGIN_BS <chr> "01012", "01013", "01019", "01029", "01039", "01059", "0110…
$ WDA <dbl> 8448, 7328, 3608, 9317, 12937, 2133, 322, 45010, 27233, 932…
$ WDM <dbl> 1973, 952, 1789, 2561, 2938, 1651, 161, 8492, 9015, 4240, 5…
$ WEE <dbl> 3208, 2796, 1623, 4244, 7403, 1190, 88, 21706, 11927, 6221,…
$ WEM <dbl> 2273, 1697, 1511, 3272, 5424, 1062, 89, 14964, 8278, 6198, …
$ logWDM <dbl> 7.587311, 6.858565, 7.489412, 7.848153, 7.985484, 7.409136,…
$ logWDA <dbl> 9.041685, 8.899458, 8.190909, 9.139596, 9.467847, 7.665285,…
$ logWEM <dbl> 7.728856, 7.436617, 7.320527, 8.093157, 8.598589, 6.967909,…
$ logWEE <dbl> 8.073403, 7.935945, 7.392032, 8.353261, 8.909641, 7.081709,…
$ BUS_ROOF_N <chr> "B03", "B05", "B04", "B07", "B09", "B08", "TMNL", "B07", "B…
$ LOC_DESC <chr> "HOTEL GRAND PACIFIC", "ST JOSEPH'S CH", "BRAS BASAH CPLX",…
$ grid_id <int> 3292, 3292, 3292, 3323, 3354, 3324, 3324, 3292, 3324, 3292,…
$ busstops <int> 8, 8, 8, 7, 8, 7, 7, 8, 7, 8, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7,…
(d) Retrieve Geometry
origin_gridwgeom <- inner_join(hexagon_w_busstops,
origin_grid,
by = "grid_id")
#origin_gridwgeom <- st_as_sf(origin_gridwgeom)4 Geovisualisation & Analysis
4.1 Data Classification
Different classification schemes highlight areas with the highest and/or lowest values, while others create classes that result in a more uniform distribution of colors. When data is sharply skewed or has extreme outliers, it’s important to consider whether the goal is to emphasize those areas or to achieve a more even distribution of colors and sizes.
The main methods of data classification are: - Quantile: each class contains an equal number of features. It assigns the same number of data values to each class. There are no empty classes or classes with too few or too many values - Jenks/Natural breaks: seeks clumps of values that are clustered together in order to form categories that may reflect meaningful groupings of areas - Equal: divides the range of attribute values into equal-sized sub-ranges
Since our variable is less skewed after log transformation, we can explore various classification methods for visualization. This approach may reveal interesting insights that were not immediately apparent before.
4.2 Plots
4.2.1 Weekday Morning Peak Period
Show the code
tmap_mode ("view")
plotlogWDM_q <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWDM",
style = "quantile",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWDM_qThe grids above are partitioned using the quantile intervals. We can observe that the bus trips are unevenly distributed across Singapore. There are lighter shares of blue (indicating lower levels of ridership) originating from the edges of the country, particularly in the West, while higher levels of ridership in the North region are indicated by the darker shades of blue.
Bus stops nearer to the residential estates appeared to be popular during the weekday morning peak period:
- West: BLK 821, BLK 252, Sunshine Place
- North: BLK 314
- North-East: BLK 477A, BLK 1, BLK 555, BLK 324
- East: BLK 109, BLK 124, BLK 756
This is likely due to a large number of people commuting from home to their workplaces/schools on weekday mornings.
Higher passenger traffic were noted at the bus stops nearer to MRT stations such as Harbourfront Station, Farrer Road Station, Yio Chu Kang Station, and Admirality Station. A possible contributing factor could be the people who are transiting from taking the MRTs to buses to get to their destinations.
Lastly, Woodlands Checkpoint also demonstrated higher ridership. This could potentially be due to the people commuting across the causeway from Malaysia into the Singapore borders.
Show the code
tmap_mode ("view")
plotlogWDM_e <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWDM",
style = "equal",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWDM_eThe map using equal intervals provided slightly different insights. We noted that the bus stop located near to MRT stations had higher levels of ridership. In particular, more trips originated from Tiong Bahru Station, Buona Vista Station, Tanah Merah Station, Admiralty Station, Harbourfront, and Woodleigh Station. Bus interchanges also appeared to be popular origins, i.e. Bukit Panjang Interchange and Joo Koon Interchange.
In general, more homogeneity is noted using the equal interval – the contrast between hexagon to hexagon is less obvious.
Show the code
tmap_mode ("view")
plotlogWDM_j <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWDM",
style = "jenks",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWDM_jUsing Jenk’s partitioning method, the results were largely similar to the other two types of interval classes. Higher bus ridership were spotted at bus stops within close proximity to MRT stations (Kranji Station, Buona Vista Station, Buangkok Station, Ranggung Station, Farrer Road Station, Stevens Station, Bedok Reservoir Station) and residential estates (Sunshine Place near Tengah, BLK 109 in Bedok, BLK 477A in Sengkang, Bef. BLK 629A in Woodlands, to name a few).
4.2.2 Weekday Afternoon Peak Period
Show the code
tmap_mode ("view")
plotlogWDA_q <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWDA",
style = "quantile",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWDA_qA look at the weekday afternoon ridership using the quantile classification yielded the following insights.
- Ridership from Woodlands Checkpoint remained high.
- Bus stops in close proximity to MRT tations and popular bus stops in residential estatements remained high.
- More trips originating from institutional areas: Opposite Ngee Ann Poly, Temasek Poly, NIE BLK 2, School of the Arts
- More trips originating from industrial buildings/business parks: North Link Bldg, Aft Senoko Way, Mapletree Business City, Woodlands Auto Hub, Opp Airline Hse, etc.
- More trips originating from hospitals: Yishun Community Hospital, Changi General Hospital
- Seletar Camp also looked to have high passenger levels
- The far West seemed to experience low ridership other than the bus stop opposite Tuas Link Station.
- Southern part of Singapore, consisting of more commercial areas, appeared to be more clustered as illustrated by the density of the darker red hexagons, compared to the weekday morning peak period.
Show the code
tmap_mode ("view")
plotlogWDA_e <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWDA",
style = "equal",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWDA_eNotably, there were higher concentration of passengers who boarded the bus at Serangoon Station, Harbourfront/VivoCity, Tiong Bahru Station, Admiralty Station, and Punggol Station during weekday afternoons according to the equal interval classification method.
Similar to the map for weekday morning peak period, the equal interval seemed to produce more homogeneous classifications.
Show the code
tmap_mode ("view")
plotlogWDA_j <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWDA",
style = "jenks",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWDA_jJenk’s classification delivered similar insights to the quantile classification, where the higher concentration of ridership can be observed in the Southern downtown areas.
It also highlighted Opp Airline Hse in the far East as a bus stop with high ridership, something not as visible using the equal interval method.
Alternative methods of commute might be more popular in the West and North-West regions illustrated by the lighter shades of red hexagons.
4.2.3 Weekend/Holiday Morning Peak Period
Show the code
tmap_mode ("view")
plotlogWEM_q <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWEM",
style = "quantile",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWEM_qGenerally, the distribution of bus ridership looks varied across the island.
During the weekend/holiday peak period, the ridership for far West region remained relatively low. Interestingly, the bus trips recorded from Seletar area appeared to have dipped compared to the weekday peak periods. Buses in these industrial areas could be oriented towards work-related travel, thus less common on weekends.
Bus stops nearer to housing estates, shopping malls, and Woodlands Checkpoint demonstrated higher levels of weekend morning ridership.
The bus stops with the highest boarding passengers are Sunshine Place, Opp BLK 271, BLK 252, Aft. Hasanah Mosque, Buona Vista Station, Harbourfront/Vivocity, Admiralty Station, BLK 555, Bedok Reservoir Station, BLK 22, BLK 109.
Show the code
tmap_mode ("view")
plotlogWEM_e <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWEM",
style = "equal",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWEM_eEqual interval classification highlighted the following bus stops to have the highest ridership during weekend/holiday morning peak period: Harbourfront/Vivocity, Tiong Bahru Station, Orchard Station/Lucky Plaza, Admirality Station, Aft. Punggol Road.
Show the code
tmap_mode ("view")
plotlogWEM_j <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWEM",
style = "jenks",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWEM_jThe findings noted with Jenk’s method are similar to the quantile classification.
4.2.4 Weekend/Holiday Evening
Show the code
tmap_mode ("view")
plotlogWEE_q <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWEE",
style = "quantile",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWEE_qOn weekend/holiday evenings, there seemed to be increased ridership at the bus stops near Changi Airport compared to the other peak periods.
Sunshine Plaza remains one of the most frequented bus stops, exhibiting high ridership across all four peak periods. While the exact reason for this is difficult to pinpoint, it’s possible that the buses stopping here connect to a wide variety of regions, which could explain the high ridership.
Woodlands Checkpoint also demonstrated high levels of ridership across the different peak periods.
Visually, it looks like there are more bus stops with high ridership across Singapore during the weekend/holiday evening peak period. For example, there seem to be an increase in passenger volume at the Tanah Merah Ferry compared to the other peak periods.
Show the code
tmap_mode ("view")
plotlogWEE_e <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWEE",
style = "equal",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWEE_eThe bus stops with higher traffic seem to be quite consistent across the different peak periods. This includes Woodlands Checkpoint, Kranji Station, Admiralty Station, Serangoon Station, Aft. Punggol Road, Bukit Panjang MRT, Yio Chu Kang Interchange.
Show the code
tmap_mode ("view")
plotlogWEE_j <- tm_basemap("CartoDB.Positron") +
tm_shape(origin_gridwgeom) +
tm_fill("logWEE",
style = "jenks",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC")
plotlogWEE_jThe findings noted with Jenk’s method are similar to the quantile classification.
5 Local Indicators of Spatial Association Analysis
6 Emerging Hot Spot Analysis
7 Limitations
Should be analysed in conjunction with other modes of public transportation